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Abstract. The circular median problem in the Double-Cut-and-Join (DCJ) distance asks to find, for 
three given genomes, a fourth circular genome that minimizes the sum of the mutual distances with the 
three other ones. This problem has been shown to be NP-complete. We show here that, if the number 
of vertices of degree 3 in the breakpoint graph of the three input genomes is fixed, then the problem is 
tractable 3 . 

1 Introduction 

Comparative genomics has been an important source of combinatorial and algorithmic questions during the 
last 20 years, especially the computation of genomic distances and ancestral genomes, as illustrated by the 
recent book of Fertin et al. [3]. Among these problems, the median problem is of particular interest: while 
the distance problem is tractable in many models, the median problem is its simplest natural extension 
(a distance is a function of two genomes, while the median score is a function of three genomes) and is 
computationally intractable in most models. Computing median is at the heart of inferring gene order 
phylogenies and ancestral gene orders [6,4,12]. This motivated research on tractability issues of genomic 
median problems, well summarized in the recent paper [8], as well as on practical algorithms to address it 
(see [9, 14, 10, 11] and references there). 

Roughly speaking, the median problem is as follows: given three genomes G\, Gi and G3 and a genomic 
distance model d, find a genome M that minimizes the cost of d over Gi,G 2 ,G3 defined by d(M,G\) + 
d(M, G2) + d(M, G3). It is in fact an ancestral genome reconstruction problem, as M can be seen as the last- 
common ancestor of G\ and G2, with G3 acting as outgroup (i.e. a genome whose last common ancestor with 
Gi and G 2 is an ancestor of M). In [8], Tannier et al. explored several variants, based on different models 
of genomes (linear, circular or mixed, see Section 2) and of genomic distances (Breakpoint, Double-Cut- 
and-Join, Reversals, . . . ). In particular, they showed that if d is the Double-Cut-and-Join (DCJ) distance, 
which is currently the most widely used genomic distance, then computing a circular or mixed median is 
NP-complete. In fact, the only known tractable median problem is the mixed breakpoint median: d is the 
breakpoint distance and the median can contain both linear and circular chromosomes. From a combinatorial 
point of view, the central object in the DCJ model is the breakpoint graph: the DCJ distance between two 
genomes with n genes is indeed easily obtained from the number of cycles and paths containing odd number 
of vertices (odd paths) in this graph [13,1]. Recent progress in understanding properties of this graph, 
and especially of the family of adequat subgraphs, lead Xu to introduce algorithms to compute DCJ median 
genomes which are efficient on real data, but do not define well characterized classes of tractable instances [9- 
12]. 

In the present work, we show the following result: if the breakpoint graph of three genomes contains a 
constant number of vertices of degree 3, then computing a DCJ circular median is tractable. To the best of 
our knowledge, this is the first result defining an explicit non-trivial class of tractable instances related to 
the DCJ median problem. In Section 2, we define precisely combinatorial representations of genomes, the 
DCJ distance, breakpoint graphs and the problem we addressed here. In Section 3, we state and prove our 
main result. 

3 Version of November 28, 2011. This paper is currently under peer-review. The results appeared in Ahmad 
Mahmoody-Ghaidary, Tractability Results for the Double-Cut and Join Multichromosomal Problem, MSc thesis, 
Department of Mathematics, Simon Fraser University, 2011. 



2 Preliminaries 



Genes, genomes and breakpoint graph Let A = {1,2, ... ,n} represent a set of n genes 4 . Each gene i has a 
head and a iaiZ z t . From now, we assume A always contains n genes. 

A genome G, with gene set A, is encoded by the order and orientation of its genes along its chromosomes 
(i.e. its gene order), or equivalently by the set of the adjacencies between its gene extremities, that can 
naturally be represented by a matching on the set of vertices V(G) — {ih,it\^ < i < n} (Fig. 1 (a)). The 
connected components of the graph whose vertices are V(G) and edges are the disjoint union of the edges of 
G and the edges {it,ih} (forcing gene extremities for a given gene to be contiguous) form the chromosomes 
of G (Fig. 1 (b)). A chromosome is linear if it is a path and circular if it is a cycle. G is circular if it 
contains only circular chromosomes (perfect matching), linear if it contains only linear chromosomes, and 
mixed otherwise. Fig. l(a,b) illustrates this view of genomes as matchings. 

The breakpoint graph B(G\, . . . , G m ) of m genomes G\, . . . ,G m on A is the disjoint union of these 
genomes, i.e. the graph with vertex set V(A) and edges given by the matchings defining these m genomes. 
Following the usual convention, we consider that edges in this graph are colored, with color Cj assigned to 
genome i (1 < i < m); it results that B(G\, . . . , G m ) can have multiple edges of different colors (see Fig. 1 
(c))- 




(a) (b) (c) 



Fig. 1. (a) A genome on 4 genes, with two chromosomes, one circular chromosome with gene order (1 2) and one linear 
chromosome with gene order (4 — 3), where the sign — indicates a reverse orientation, (b) The same genomes with 
added dashed edges connecting gene extremities: every connected component of the resulting graph is a chromosome, 
(c) The breakpoint graph of three genomes (whose edges are respectively light gray, thin black and thick black) on 4 
genes. 



DCJ distance and median Given two genomes G and M on A, with M being a circular genome, the DCJ 
distance dDCj(G, M) is given by 

d DCJ (G,M) = n-c(G,M), (1) 

where c(G, M) is the number of cycles in B(G, M). So larger c(G, M) implies smaller distance docj{G, M). 
The general definition of the DCJ distance (when both G and M are mixed genomes) also requires to consider 
the odd paths 5 of the breakpoint graph [1] , but it is easy to see that the breakpoint graph does not contain 
odd path if at least one genome is circular. Note that the edges on a cycle in B{G, M) are alternatively from 
M and G. An (G, M)- alternating cycle is an even cycle with edges in M and G alternatively. For simplicity, 
we may sometimes only call such cycles alternating cycles. 

A DCJ circular median for three genomes G\, Gi, and G3, or alternatively for their breakpoint graph 
B(G%, G2, G3), is a circular genome M which minimizes 

3 3 
]T dncAGuM) = 3n - £ c(G t , M) 

i=l i=l 

4 The term gene is used here in a generic way, and might include other genomic markers such as synteny/orthology 
blocks for example. 

An odd (resp.even) subgraph is a subgraph with an odd (resp. even) number of vertices. 
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. So a circular genome M which maximizes the the total number of (M, Gi)-alternating cycles (for an 
i € {1, 2, 3} ) is a DCJ circular median. 



Terminology From now, by median we always mean DCJ circular median. We denote also by m(B) the sum 
d D c.j{Gi, M) + d DC j(G 2 , M) + d DC j(G 3 , M) for a median M. 

Let B = B(Gi, G2, G3) be a breakpoint graph, and let M be a median of B. The graph Bm{G\, G%, G3) = 
BUM (also denoted by Bm when the context is clear) is called the median graph of B with the DCJ circular 
median genome M (using disjoint union). The edges in G\ U G2 U G3 are called colored edges, and edges in 
M are called median edges. 

A fc-cycle in £?m is an (M, Gi)-alternating cycle of length fc, for some i, in B^- We denote the total 
number of alternating cycles for a median graph M of a breakpoint graph B by cyc(-B) 6 . If i? is a subgraph 
of -B, then cyc(iJ) is the maximum number of alternating cycles composed of edges in H, taken over all 
matchings in H. 

A terminal vertex in a graph is a vertex of degree 1. A subgraph of B is said to be isomorphic to Ck 
(resp. Pk) if it is a cycle (resp. path) on k vertices. 

Remark 1. The problem we consider in the present work is to compute a DCJ circular median of three given 
genomes, or equivalently to find a matching in B that maximizes the number of alternating cycles. From this 
point of view this is a purely graph theoretical problem that can be extended naturally to any edge-colored 
graph, with the convention that if the graph has an odd number of vertices, then exactly one vertex does 
not belong to the matching. 

Shrinking in a breakpoint graph Shrinking a pair of vertices {u, v} or an edge with end vertices u and v 
was defined in [9]. It consists of three steps: (1) removing all edges between u and v (if there is any), (2) 
identifying the remaining edges incident to both u and v and with same color, (3) removing u and v. We 
denote the resulting graph by B ■ {u,v} (Fig. 2). 



Proposition 1. Let B be the breakpoint graph of genomes G%, . . . , G m , and u, v € V(B). Suppose that there 
are k colored edges between u and v. If there exists a median M containing the edge uv, then cyc(B) = 
cyc(B ■ {u, v}) + k. 

Proof. Consider a median M which contains the edge uv (which implies that both u and v are in the same 
alternating cycle in Bm). Let B' = B ■ {u, v}, M' = M — {u, v} (the graph obtained from M by removing 
u, v and the edge uv). 

Let G be an alternating cycle in Bm- If C does not contain u and v, then, obviously, G does not 
contain any of the k edges between u and v. Thus, G remains unchanged in B' M ,. Assume now that G 
contains uv. If the length of G is larger than 2, shrinking {u,v} results in a cycle with smaller length in 
B' M , (the length decreases by 2). Otherwise, if G has length 2, it disappears in B' M ,. Thus the number of 

6 Note that cyc(B) does not depend only on the topology of B, but also on the colors of its edges. Moreover, for 
different medians M of B, Bm has the same number of alternating cycles, so cyc(_B) does not depend of a particular 




Fig. 2. Illustration of the shrinking of a pair {u, v} of vertices of a breakpoint graph. 



median. 
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alternating cycles which disappear in B' U M' is k, since there are k edges between u and v. Therefore, 
cyc(B) < cyc(B • {u, v}) + k. 

Now suppose N' is a median of B' . By a similar argument, if N = N' U {u, u}, then £>at has cyc(£> ■ 
{u, v}) + k alternating cycles. So, cyc(B) > cyc(B ■ {u, v}) + k, and, as cyc(B) = cyc(£? • {u, w}) + k, we have 
that M' (resp. iV) is a median of B' (resp. £>). 



3 A class of tractable instances 



Our main theoretical result is the definition of a large class of tractable instances for the median problem, 
namely the ones whose breakpoint graph contains few vertices of degree 3. Obviously, the median problem 
for three genomes involves a breakpoint graph with maximum degree 3. We show here that the hardness of 
the problem is due to these vertices of degree 3. 

Theorem 1. Let G\,Gi, and G3 be three genomes. If there exists a median of B = £?(Gi, G2, G3) with at 
most I edges whose both end-vertices are of degree 3 in B, then computing such a median can be done in 
time 0(n 3 ■ (£ + 1) • (3 m • m 2e + 1)), where m is the number of vertices of degree 3, and n is the number of 
genes in Gi, G2, G3. 

Remark 2. Note that, as corollaries of this theorem, we have in particular that, 

1. if m is bounded, then computing a median is tractable, 

2. if I is bounded, then computing a median is Fixed-Parameter Tractable (FPT) (see [7] for a reference 
on FPT algorithms) with parameter to. 

Moreover, if to is not bounded, we can remove some edges incident to vertices of degree 3, so that in the 
new instance the number of vertices of degree 3 is bounded. Now, by point 1 above, there is a polynomial 
time algorithm which computes the median of the new instance. 

Informally, to prove Theorem 1, we first consider the case where B is a collection of cycles and paths (i.e. 
has maximum degree 2) and show that a median can be computed in polynomial time. Next, we consider 
all possibilities (configurations) for matching vertices of degree 3 as median edges. For each configuration, 
we reduce the breakpoint graph by shrinking and removing some edges to obtain a graph whose connected 
components are paths or cycles. Having computed all possible configurations for vertices of degree 3 and 
being able to compute a median for all resulting graphs lead to Theorem 1. 

From now, G\, G2, and G3 are mixed genomes on n genes, and M is a median of these genomes, unless 
otherwise specified. We denote their breakpoint graph by B, and the median graph by Bm- 



3.1 Preliminary results 

We first introduce two useful lemmas that give lower bounds on the function eye in various cases. 

Lemma 1. If B is isomorphic to Pk or C2k, for k > 1, then for every subgraph H C B, cyc(H) > \ E ^\ . 

Proof. Consider the path Pk = u\U2 ■ ■ ■ u^. Let M be the matching consisting of the edges U\it2, W3W4, . . ., 
and u t -\u t , where t — 2[f J. Obviously, the number of alternating cycles in Pk U M is \k/2\, so cyc(Pfc) > 
I > j£(p^ Similarly cyc (c 2fe ) > k = MCwli. see Fig . 3. 

Any proper subgraph H C Pk or C2k is a union of disjoint paths. If we take the union of matchings 
described above for each of these paths and call it M, there are at least \E(H)\/2 alternating cycles in 
H U M. Therefore for any subgraph H C H, cyc(H) > 



Definition 1. Let S and T be two subgraphs of B. T is an alternating-subdivision of S if we can obtain an 
isomorphic copy ofT from S as follows: subdivide each edge e = {a,b} by an even (possibly zero) number of 
vertices resulting in a path av\V2 ■ ■ - V2kb, then remove every second edge, i.e., V\V2,v^v^, . . . ,f2fe-iW2fc- We 
call the removed edges a completing matching for T respective to S. 
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4^* 4 4— • 



Fig. 3. Median edges (dashed) for cycles and union of disjoint paths. 



In the previous definition, note that there might be more than one way to obtain an isomorphic copy of 
T from S, and consequently, completing matching is not necessarily unique. 

Lemma 2. If T is an alternating-subdivision of S, then cyc(T) > cyc(S). 

Proof. Let M be a median of S, M' an arbitrary completing matching for T respective to S and M" = 
M U M'. M" is a perfect matching of T, and each alternating cycle in S U M defines a unique alternating 
cycle in T U M" which implies that cyc(T) > cyc(S) (see Fig. 4). 




Fig. 4. (a) Obtaining T as an alternating-subdivision of S. (b) Obtaining a matching of T from a median of S (the 
dashed edges are the median edges and the edges of a completing matching for T respective to S). 



3.2 Independence of arbitrary paths and even cycles 

In this section we introduce the fundamental notion of independence of connected components of cycles and 
paths in a breakpoint graph. 

Definition 2. Let H be a subgraph of B. An _ff-crossing edge in a median graph Bm is a median edge which 
connects a vertex in V(H) to a vertex in V(B) — V(H). An if -crossing cycle is an alternating cycle which 
contains at least one H -crossing edge. The subgraph H is fc-independent if there is a median M for B such 
that the number of H -crossing edges in Bm is at most k. 

Proposition 2. Let H be a connected component of B. If H is isomorphic to P^k or C^k, for k > 1, then 
H is O-independent. 

Proof. Let M be a median of B. Suppose M has I ff-crossing edges in Bm- If ^ = 0, then we are done, so 
assume that I > 0. Since H has an even number of vertices, £ is even and I > 2. Because if is a connected 
component in B, each if-crossing cycle contains an even number of ii-crossing edges. 
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Let Cm,h be the set of all ff-crossing cycles in Bm, and E^ H be the set of all ff-crossing edges in Bm- 
Let X(M) be the set of colored edges in all cycles of Cm,h, and Y(M) be the set of all ii-crossing edges in 
all cycles of Cm,h- 

Case 1. If there is no //-crossing cycle, i.e., Cm.h = X(M) = Y(M) = 0, we modify M by removing all 
ff-crossing edges, and re-matching the vertices inside of H together and outside of H together. Since t is 
even, this is always possible and we get a median with no ii-crossing edge. 

Case 2. From now, we assume that there exists at least one ii-crossing cycle. The remainder of the proof 
relies on a transformation on M that reduces the number of edges in .ff-crossing cycles, leading to a median 
with no ii-crossing edge. 

Step 1. The first step consists of choosing, for each ii-crossing cycle, an arbitrary colored edge in H incident 
to an ii-crossing edge from this cycle. Let S be the subgraph of B induced by these chosen colored edges 
and T = X(M) - S. 

Claim 1. T is an alternating- subdivision of S. For a vertex x G V(S) let % be the neighbor of x in M. 
If u, v e V(S) and uv e E(S) then, by definition, uv is a colored edge of an ff-crossing cycle which is 
incident to an ii-crossing edge. Therefore, there is an alternating path from um to vm, with alternating 
colored and median edges from that cycle. If this path has t colored edges, we subdivide the edge uv using 
2t — 2 vertices and remove every second edge. Proceeding in this way for every edge uv € E(S) we obtain 
an alternating-subdivision T of S. 

Claim 2. cyc(T) > \Cm,h\/2. First, as every colored edge is in at most one alternating cycle and two 
edges of the same color are not incident to each other, l-E^S 1 )! = \Cm,h\- Also S C H, and by Lemma 1, 
cyc(S) > \E(S)\/2. Finally, from Lemma 2, cyc(T) > cyc(S) > \E(S)\/2 = \C m ,h\/ 2 - 

Step 2. Now we remove all the edges in E^ H . Let Ms be an arbitrary median of S, Mt the matching 
for T defined by the union of M$ and an arbitrary completing matching for T respective to S, and M' = 
(M - Y(M)) U M s U Mr- 
Claim 3. M' is a median of B. First, by removing the edges in Y(M), the total number of alternating 
cycles decreases by \Cm,h\- Next, Ms and Mt contain at least \Cm,h\/2 alternating cycles each (Claim 2 
above). Hence, the new matching M' contains at least the same number of alternating cycles than M . By 

definition of a median, M' can not contain more alternating cycles than M, so it contains the same number 

i c I 

of alternating cycles, and is a median of B. Note that this also implies that cyc(S) = cyc(T) = 1 " • 

Claim 4. X(M') C X(M) and X(M') ^ X(M). If there exists e e X(M') - X(M) then there would be at 
least one ff-crossing cycle induced by M' which is not induced by Ms or M T , this implies B M > would contain 
more alternating cycles than Bm, which contradicts the fact that Bm and Bm 1 have the same number of 
alternating cycles. Next, X(M') C X(M), as E(S) C X(M) and E(S) n X(M') = (the vertices in S arc 
matched to themselves). Therefore, \X(M')\ < \X(M)\. 

By iterating the above steps we obtain a median with no crossing cycle. Then, by case 1, we can modify 
this median to a median without ii-crossing edge. 

Proposition 3. Let H be a connected component of B. If H is isomorphic to i^fe-i; for k > 1, then H is 
1 -independent. 

Proof. Wc follow the same proof strategy than for Proposition 2. The number of ii-crossing edges is odd. If 
there is no ff-crossing cycle, we can remove an even number of them as in case 1 of the proof of Proposition 2, 
leaving only one ii-crossing edge. Otherwise, if we assume that there are ii-crossing cycles, we can apply the 
transformation defined in case 2 of the proof of Proposition 2. It has similar properties, as, from Lemma 1, 
for every subgraph H' C P 2k -i, cyc(-ff') > \E(H')\/2, which implies again that cyc(S) = cyc(T) = i^Lid. 

Proposition 4. If B contains only cycles and paths, there exists a median of B in which even components 
have no crossing edge, and each odd path has exactly one crossing edge. 

Proof. This result follows from applying, on an arbitrary median graph, the transformation introduced in 
the proof of in Proposition 2 to each even/odd path or even cycle of the breakpoint graph, reducing then 
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the number of crossing edges for each of them, without increasing the number of crossing edges in other 
components. 

3.3 Alternating cycles for arbitrary paths and even cycles 

The results of the previous section open the way to computing a median of a breakpoint graph with maximum 
degree 2 by considering each path or even cycle independently, and matching odd paths into pairs (each 
defined by a single crossing edge). The main point of the current section is to show that paths and even 
cycles are easy to consider when computing a median. 

Proposition 5. If H C B is isomorphic to P^, for some k>l, then cyc(H) = |_§J. Moreover, there exists 
a median whose edges in H define |_f J alternating 2-cycles, and one crossing edge incident to a terminal 
vertex of H if k is odd. 

Proof From Lemma 1, cyc(H) > [fj- We use induction on k to show that cyc(H) < |_§J- This obviously 
holds for k = 1. So we assume that k > 2, and consider a median M for H. If there is no 2-cycle (an 
alternating cycle consisting of two parallel edges) in Hm, each alternating cycle has length at least 4, and 
hence at least 2 colored edges. So cyc(tf) < = [^\ < [§J. 

Now assume that the median M contains a 2-cycle, with vertices u and v. Shrinking {u, v} results in H' 
that is either a single path with k — 2 vertices or two paths with p and q vertices such that p + q = k — 2. In 
both cases, using induction and the fact that all paths are O-independent or 1-independent, we can conclude 
that, 

- if H' contains one path, cyc(H') < + 1 = L|J > 

- if H' contains two paths, cyc(ff') < |JJ + L|J + 1 < L|J • 

To obtain a median with exactly |_§J alternating cycles in H, we can simply define median edges by linking 
successive vertices in H (as in the proof of Lemma 1). If k is odd this forces the unique ff-crossing edge 
(Proposition 3) to contain the last end vertex of H (one of its two end vertices), which has no impact on the 
number of alternating cycles as, by definition, this crossing edge will not belong to any alternating cycle. 

Lemma 3. If B is isomorphic to C 2 fc, for some k > 1, then either cyc(B) — k or cyc(B) = k + 1. 

Proof. Obviously, cyc(-B) > k. So, we assume that cyc(B) > k. Let M be an arbitrary median of B. 
Following the proof of Proposition 5, if all alternating cycles in Bm have length at least 4, then the number 
of alternating cycles is at most k, so there must exist at least one 2-cycle in Bm- Let uv be a colored edge 
in a 2-cycle: cyc(B) = cyc(B ■ {u, v}) + 1 (Proposition 1). Moreover B ■ {u,v} is a path, or a cycle, and 
it is a cycle if and only if the two edges incident to the ends of uv have the same color. If it is a path, 
Proposition 5 implies that cyc(B • {u, v}) = k — 1 and cyc(B) = k, which contradicts the assumption that 
cyc(-B) > k. So B ■ {u,v} is a cycle and the edges incident to uv have same color. By induction on k (note 
that cyc(C*4) = 3 and cyc(C2) = 2) we can find a median of B ■ {u, v} with cyc(£? • {u, v}) = k — l + l = fc 
or cyc(B ■ {u, v}) = k — 1, alternating cycles. Hence, cyc(B) = k + 1 or cyc(B) = k. 

Some definitions below assume that cycles of B are oriented, so we assume from now that edges of every 
cycle of B are consistently oriented, clockwise or counterclockwise. Fig. 5 provides an illustration. 

Definition 3. A cycle C 2 fc of B is of the first kind if cyc{Cik) — k, and it is of the second kind if cyciCik) = 
k + l. 

Definition 4. Let C be a cycle of B. The signature of a vertex of C is an ordered pair (a,b) such that a 
and b are the colors of the edges incident to that vertex: a is the color of the incoming edge and b the color 
of the outgoing edge. Two vertices u and v are diagonal if their signatures are of the form (a, b) and (b, a). 
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Fig. 5. Dashed edges are median edges: (Left) a cycle Ce, of the first kind — (Right) a cycle C§ of the second kind; 
the matched vertices are diagonal 



Definition 5. Let M be a median of an even cycle C , and uv and u'v' be edges in M : uv and u'v' cross 
if u,u' ,v,v' appear in this order along C. A cross-free diagonal matching for C is a matching whose edges 
connect pairs of diagonal vertices and no two edges cross. 



Lemma 4. Let B be isomorphic to an even cycle of the second kind, and M be a median of B. (1) Each 
edge in M joins two diagonal vertices, and (2) the edges in M do not cross. 

Proof. Let B = C 2 fc. We first prove point (1) by contradiction. Assume that uv £ M and that u and v 
are not diagonal. Let (a, b) and (c,d) be the respective signatures of u and v. Our assumption implies that 
(c, d) ^ (b, a), and we can distinguish two cases: (a, b) = (c, d) and (a, b) ^ (c, d). 

— If (a, b) = (c,d), by shrinking the pair {u,v} we obtain a smaller cycle C 2 fc_ 2 , and by Proposition 1, 
cyc(-B) = cyc(C 2 fc • {u,v}) = cyc(C 2 k-2) < + 1 = fc which is a contradiction, since B is of the 
second kind. Note that in this case u and v cannot be consecutive vertices on B. 

— If (a, b) 7^ (c, d), by shrinking the pair {u,v}, the resulting graph can be either a path with 2k — 2 
vertices, or a cycle and a path, together with 2k — 2 vertices. In the first case, vertices u and v must 
be consecutive on B. But now cyc(B) = cyc(C 2 fc • {u,v}) + 1 = cyc(P 2fe _ 2 ) + 1 = + 1 < k + 1, 
which is a contradiction, since B is of the second kind. In the second case cyc(£?) = cyc(C2fc • {u,v}) = 
cyc(CV) + cyc(P m ) < I + 1 + < 2fc 2~ 2 + 1 < k + 1, since paths are either 0- or 1-independent and 
£ + m = 2k — 2 (note that in the latter case u and v cannot be consecutive). This is again a contradiction, 
as B is of the second kind. 

We now prove point (2). B is of the second kind, as shown in the proof of Lemma 3, there is a 2- 
cycle containing a colored edge u'v'. Moreover, by point (1), vertices u' and v' are diagonal. So B ■ {v! , v'} is 
isomorphic to C^k-i and it must be of the second kind, as otherwise cyc(C2fe) = cyc(C2/c-2) + 1 = + 1 = 
k < k + 1. Obviously, u'v' does not cross with any median edge of M. By shrinking this pair and, by induction 
on the length of the cycle, applied to C 2 fe • {u',t/}, the proof is complete. 

Lemma 5. Let B be isomorphic to C 2 fc. B is of the second kind if and only if there exists a matching M of 
B that is cross-free diagonal. 

Proof. The necessity follows from Lemma 4. Now assume that there exists a cross-free diagonal matching 
M on vertices of B. It is easy to see that M contains at least one edge uv where u and v are consecutive 
on B (note that M is a perfect matching, since B has even number of vertices). If we shrink the pair {u, v}, 
the resulting graph is C 2 fc_ 2 and the remaining edges of M are a cross-free diagonal matching for C 2 fc- 2 . We 
can complete the proof by induction on k, since cyc(C 2 fc) = 1 + cyc(C 2 fe_ 2 ) = 1 + 2k 2 2 +1 = k + 1, and the 
statement of the lemma is obviously true for k = 1 and k = 2. 

Lemma 6. Let B be isomorphic to C 2 fc. Deciding if B admits a cross-free diagonal matching can be done 
in time O(k). 
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Proof. Let B = v\V2 ■ ■ ■ v 2 k- We rely on a simple greedy algorithm, which is in fact a classical algorithm for 
deciding if a circular parenthesis word is balanced; we present it for the sake of completeness. 

The key point was given in the proof of Lemma 5: any cross- free diagonal matching contains at least one 
pair of consecutive vertices that are matched. Given the circular nature of B, we can extend this property as 
follows: if u and v are consecutive diagonal vertices and B admits a diagonal cross-free matching, then there 
exists a matching where u and v are matched. This leads immediately to a greedy algorithm that matches 
such vertices as soon as they are visited, using a simple stack data structure: 

1. Let M = be an empty matching. 

2. Let S be an empty stack. 

3. For j = 1 to 2k 

(a) if the top element Vi of S is diagonal with Vj, pop it from the stack S and add {vi, Vj} to 
M. 

(b) else, push Vj on S. 

4. If S is empty, B admits a cross-free diagonal matching, given by M, otherwise it docs not 
admit one. 

The time complexity of this algorithm is obviously linear in k. 

Proposition 6. // B is isomorphic to an even cycle of size k (k > 2), then computing cyc(C) can be done 
in time O(k). 

Proof. Immediate consequence of Lemma 3, Lemma 5, and Lemma 6. 
3.4 Proof of Theorem 1 

We now have all the elements to prove our main result, Theorem 1. We first prove that computing a median 
of a breakpoint graph of maximum degree two is tractable. 

Lemma 7. If B has maximum degree 2, then there exists a median of B such that every odd connected 
component of B is connected by median edges to exactly one other odd connected component. 

Proof. Let M be a median as described in the proof of Proposition 4: every even connected component has 
no crossing edge and each odd path has exactly one crossing edge. Moreover, odd cycles have at least one 
crossing edge. 

Let H be an odd connected component and e one of its crossing edges, connecting H to another odd 
component H' . Shrinking e results into (H U H') ■ e which is a set of even components and it is then 0- 
independent. Moreover, as H and H' were distinct connected components of B, from Proposition 1 (with 
k = 0), cyc((if U H') ■ e) = cyc(H U H'). 

Repeating this argument for other odd components and the fact that the number of odd components is 
even (because the number of vertices in the breakpoint graph is even) completes the proof. 

Lemma 8. If B has maximum degree 2 and consists of two odd connected components Hi and H 2 , of 
respective sizes k\ and k 2 , then computing a median of B can be done in time 0{k\k 2 {ki + k 2 )). 

Proof. For parity reasons, a median M contains at least one edge e between Hi and H 2 (e is a iJi-crossing 
edge). By shrinking e we obtain either one even connected component or two even connected components, 
and, from Proposition 4, we can compute a median for each connected component independently. This 
computation requires linear time (Propositions 5 and 6). There are at most k\k 2 possible candidates for e. 
Hence computing a median of B is tractable in time 0{kik 2 {k\ + k 2 )). 

Proposition 7. If B is a breakpoint graph with 2n vertices with maximum degree 2, then computing a 
median of B can be done in 0(n 3 ). 
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Proof. We first consider the case where B contains only odd connected components. We define a complete 
edge- weighted graph Kb as follows: 

1. each connected component C defines a vertex vc] 

2. each edge {v c ,v D } has weight cyc(CU D) 

By Lemma 8, Kb is computable in polynomial time. We claim it is computable in 0(n 3 ). Suppose B has t 
components and n%, . . . , n t are the number of vertices in each component. So we have ni + . . . + n t = 2n. 
The time to construct Kb is of order 

n l • Uj ■ (n, + Uj) = n 2 ■ Uj + ri; • 

= i((2n) 3 -(n? + ... + n?))<5n 3 . 

Finally, by Lemma 7 we only need to find a maximum weight matching for Kb, which can be done in 
0(n 3 ) by using Edmonds's algorithm [2]. 

If the breakpoint graph B has maximum degree 2, its connected components are paths or cycles. From 
Proposition 4 and Proposition 6 we can find the median edges for even components independently. Finally 
for odd components we find the median edges as described in the first part of the proof. 

Proof of Theorem 1. We now assume that B has maximum degree 3. 

The main idea is to consider all possibilities for matching the vertices of degree 3 of B. A vertex u of 
degree 3 can be matched in two ways. 

— If it is matched to another vertex of degree 3, by shrinking these two vertices we obtain a smaller graph 
with fewer vertices of degree 3, and, from Proposition 1, we know precisely the number of alternating 
cycles (here 2-cycles) lost in the shrinking process, given by the number of genome edges between the 
two shrinked vertices. 

— If it is matched to a vertex of degree less than 3, then one of the edges incident to u is not in any 
alternating cycle, and we can remove this edge and transform u into a vertex of degree 2 (Fig. 6). 




Fig. 6. The dashed edge is a median edge. The gray edge cannot be in any alternating cycle. 



Now for each i, < i < £, we can select 2i vertices among all m vertices of degree 3 (there are 0(m 21 ) 
possibilities), compute an arbitrary perfect matching on these 2i vertices, and, for each each remaining 
vertex of degree 3, remove an edge incident to this vertex (there are 0(3 m_2 *) possibilities). The resulting 
breakpoint graph B' is of maximum degree 2 and a median can be computed in time 0(n 3 ), whose number 
of alternating cycles needs only to be augmented by the number of edges between matched vertices of degree 
3 in B. 

The number of all such configurations is in 0((£ + 1) ■ (m 2t + 1) • 3 m ) (the term +1 is needed to account 
for the case m = 0), which leads to the stated complexity. 
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4 Conclusion 



In this work, we characterized a large class of tractable instances for the DCJ median problem (with circular 
median and mixed genomes). In fact, we showed that only the vertices of degree 3 make the problem 
intractable. Also, by removing k edges from the breakpoint graph and decreasing its maximum degree, cost 
of its median is not bigger than k plus the cost of the main median (i.e. the current cost — k is a lower bound 
for the cost of the main median). Finally, we showed there is an FPT algorithm for the DCJ median problem, 
if there exists a median such that the number of its edges connecting two vertices of degree 3 is bounded. 

Our work also shows that the multiplicity of solutions (i.e. medians) is likely to happen when dealing 
with breakpoint graphs with long paths or even cycles, as we showed that such components can admit 
several optimal medians. Hence, our results, as they stand now, are of interest more for computing the score 
of a median than for computing actual medians that can be seen as realistic ancestral genomes. However, 
the problem of uniform sampling of optimal median is worth being explored, even in the simpler setting of 
breakpoint graphs of maximum degree 2 in a first time. 

From a theoretical point of view, our work raises several questions. First, it leaves open the possibility 
that the DCJ median problem is FPT. Using the number of vertices of degree 3 as a parameter is a a natural 
approach, although this seems to be a difficult question to address. The next obvious problem is to extend our 
approach to the case of a mixed or linear median. This would require to better understand the combinatorics 
of odd paths in the breakpoint graphs in relation to medians. The simpler problem to find an optimal way to 
remove exactly one edge from each circular chromosome of a circular median while minimizing the number 
of destroyed alternating cycles is also open. Extending our results to the related DCJ halving problem [8] is 
also a natural question. 

Another interesting question is about expanding the breakpoint distance toward the DCJ distance: for 
two genomes G\ and G 2 on n genes, their breakpoint distance is equal to 

d B p(Gi, G 2 ) = n - a(G l7 G 2 ) - *e(G 1; G 2 ). 

The parameters a(Gi,G 2 ) and e(G\,G 2 ) are also equal the number of 2-cycles and 1-paths (Pi) in the 
breakpoint graph B(Gi,G 2 ), respectively. The DCJ distance of these genomes is: 

ducj(Gi,G 2 ) = n - c(G u G 2 ) - p{Gl 2 ° 2) , 

where c(Gi, G 2 ) and p{G\, G 2 ) are the number of (even) cycles and odd paths in the B(G\,G 2 ), respectively. 
This motivates us to define a dissimilarity function as follows: 

d (lJ )(Gi,G 2 ) = n- Ci(G u G 2 ) - ^(Gi, G 2 ), 

where c,(Gi,G 2 ) is the number of (even) cycles with at most 2i vertices, and pj{G\,G 2 ) is the number of 
odd paths with at most 2j — 1 vertices. By considering this dissimilarity measure, the median problem is 
tractable when i = j = 1, since d(i,i) = d BP . By taking i = j = 00 we have = c^dcj; an d the median 

problem would be intractable. A natural question is then to understand for which values of i and/or j the 
median problem is tractable, or FPT. 
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